ScalaTrace: Tracing, Analysis and Modeling of HPC Codes at Scale
نویسندگان
چکیده
Characterizing the communication behavior of large-scale applications is a difficult and costly task due to code/system complexity and their long execution times. An alternative to running actual codes is to gather their communication traces and then replay them, which facilitates application tuning and future procurements. While past approaches lacked lossless scalable trace collection, we contribute an approach that provides orders of magnitude smaller, if not near constant-size, communication traces regardless of the number of nodes while preserving structural information. We introduce intraand inter-node compression techniques of MPI events, we develop a scheme to preserve time and causality of communication events, and we present results of our implementation for BlueGene/L. Given this novel capability, we discuss its impact on communication tuning and on trace extrapolation. To the best of our knowledge, such a concise representation of MPI traces in a scalable manner combined with time-preserving deterministic MPI call replay are without any precedence.
منابع مشابه
Tools for Simulation and Benchmark Generation at Exascale
The path to exascale high-performance computing (HPC) poses several challenges related to power, performance, resilience, productivity, programmability, data movement, and data management. Investigating the performance of parallel applications at scale on future architectures and the performance impact of different architecture choices is an important component of HPC hardware/software co-desig...
متن کاملScalable communication event tracing via clustering
Communication traces help developers of high-performance computing (HPC) applications understand and improve their codes. When run on large-scale HPC facilities, the scalability of tracing tools becomes a challenge. To address this problem, traces can be clustered into groups of processes that exhibit similar behavior. Instead of collecting trace information of each individual node, it then suf...
متن کاملHigh Performance Computing in Hydro- and Environmental Engineering
High Performance Computing (HPC) can be understood as the interaction of parallel and adaptive methods with fast solvers on powerful parallel computers. Therefore, an introduction to parallel and adaptive methods as well as to fast solvers is given. The interaction of these methods is demonstrated using three examples which deal with groundwater flow and transport processes, gaswater flow as we...
متن کاملhpsgprof: A New Profiling Tool for Large–Scale Parallel Scientific Codes
Contemporary High Performance Computing (HPC) applications can exhibit unacceptably high overheads when existing instrumentation–based performance analysis tools are applied. Our experience shows that for some sections of these codes, existing instrumentation–based tools can cause, on average, a fivefold increase in runtime. Our experience has been that, in a performance modelling context, thes...
متن کاملA Model for evaluating the fire resistance of high performance concrete columns
A numerical model, in the form of a computer program, for evaluating the fire resistance of high performance concrete (HPC) columns is presented. The three stages, associated with the thermal and structural analysis, for the calculation of fire resistance of columns is explained. A simplified approach is proposed to account for spalling under fire conditions. The use of the computer program for...
متن کامل